pomp constructor: Constructor of the basic pomp object

Description

This function constructs a pomp object, encoding a partially-observed Markov process model together with a uni- or multi-variate time series. As such, it is central to all the package's functionality. One implements the model by specifying some or all of its basic components. These include:

The basic structure and its rationale are described in the Journal of Statistical Software paper, an updated version of which is to be found on the package website.

Usage

pomp(data, times, t0, ..., rprocess, dprocess, rmeasure, dmeasure, measurement.model, skeleton, skeleton.type, skelmap.delta.t, initializer, rprior, dprior, params, covar, tcovar, obsnames, statenames, paramnames, covarnames, zeronames, PACKAGE, fromEstimationScale, toEstimationScale, globals, cdir, cfile, shlib.args)

Arguments

data, times

required; the time series data and times at which observations are made. data should be given as a data-frame and times must indicate the column of observation times by name or index. times must be numeric and strictly increasing. Internally, data will be internally coerced to an array with storage-mode double.

In addition, a pomp object can be supplied in the data argument. In this case, the call to pomp will add element to, or replace elements of, the supplied pomp object.

The zero-time, at which the stochastic dynamical system is to be initialized. This must be no later than the time of the first observation, i.e.,

t0 <= times[1]<="" code="">.

rprocess, dprocess

optional; specification of the simulator and probability density evaluation function of the unobserved state process. See below under “The Unobserved Markov State-Process Model” for details.

Note: it is not typically necessary (or even feasible) to define dprocess. In fact, no current pomp inference algorithm makes use of dprocess. This functionality is provided only to support future algorithm development.

rmeasure, dmeasure, measurement.model

optional; specifications of the measurement model. See below under “The Measurement Model” for details.

skeleton

optional; the deterministic skeleton of the unobserved state process. See below under “The Deterministic Skeleton” for details.

skeleton.type, skelmap.delta.t

deprecated. These will be removed in a future release.

initializer

optional; draws from the distribution of initial values of the unobserved Markov state process. Specifically, given a vector of parameters, params and an initial time, t0, the initializer determines the state vector at time t0. See below under “The State-Process Initializer” for details.

rprior, dprior

optional; specification of the prior distribution on parameters. See below under “Specifying a Prior” for details.

params

optional; named numeric vector of parameters. This will be coerced internally to storage mode double.

covar, tcovar

optional data frame of covariates: covar is the table of covariates (one column per variable); tcovar the name or the index of the time variable.

If a covariate table is supplied, then the value of each of the covariates is interpolated as needed. The resulting interpolated values are made available to the appropriate basic components. Note that covar will be coerced internally to storage mode double. See below under “Covariates” for more details.

obsnames, statenames, paramnames, covarnames

optional character vectors specifying the names of observables, state variables, parameters, and covariates, respectively. These are used only in the event that one or more of the basic components are defined using C snippets or native routines. It is usually unnecessary to specify obsnames or covarnames, as these will by default be read from data and covars, respectively.

zeronames

optional character vector specifying the names of accumulator variables (see below under “Accumulator Variables”).

PACKAGE

optional string giving the name of the dynamically loaded library in which any native routines are to be found. This is only useful if one or more of the model components has been specified using a precompiled dynamically loaded library; it is not useful if the components are specified using C snippets.

fromEstimationScale, toEstimationScale

optional parameter transformations. Many algorithms for parameter estimation search an unconstrained space of parameters. When working with such an algorithm and a model for which the parameters are constrained, it can be useful to transform parameters. toEstimationScale and fromEstimationScale are transformations from the model scale to the estimation scale, and vice versa, respectively. See below under “Parameter Transformations” for more details.

globals

optional character; C code that will be included in the source for (and therefore hard-coded into) the shared-object library created when the call to pomp uses C snippets. If no C snippets are used, globals has no effect.

cdir, cfile, shlib.args

optional character variables. cdir specifies the name of the directory within which C snippet code will be compiled. By default, this is in a temporary directory specific to the running instance of R. cfile gives the name of the file (in directory cdir) into which C snippet codes will be written. By default, a random filename is used. The shlib.args can be used to pass command-line arguments to the R CMD SHLIB call that will compile the C snippets.

...

Any additional arguments given to pomp will be made available to each of the basic components. To prevent errors due to misspellings, a warning is issued if any such arguments are detected.

Value

The pomp constructor function returns an object, call it P, of class pomp. P contains, in addition to the data, any elements of the model that have been specified as arguments to the pomp constructor function. One can add or modify elements of P by means of further calls to pomp, using P as the first argument in such calls.

Important note

It is not typically necessary (or even feasible) to define all of the basic components for any given purpose. Each pomp algorithm makes use of only a subset of these components. Any algorithm requiring a component that is not present will generate an error letting you know that you have not provided a needed component.

Using C snippets to accelerate computations

pomp provides a facility whereby users can define their model's components using inline C code. Furnishing one or more C snippets as arguments to the pomp constructor causes them to be written to a C file stored in the R session's temporary directory, which is then compiled (via R CMD SHLIB) into a dynamically loadable shared object file. This is then loaded as needed. Note to Windows and Mac users: By default, your R installation may not support R CMD SHLIB. The package website contains installation instructions that explain how to enable this powerful feature of R.

General rules for writing C snippets

In writing a C snippet one must bear in mind both the goal of the snippet, i.e., what computation it is intended to perform, and the context in which it will be executed. These are explained here in the form of general rules. Additional specific rules apply according to the function of the particular C snippet. Illustrative examples are given in the tutorials on the package website.

C snippets must be valid C. They will embedded verbatim in a template file which will then be compiled by a call to R CMD SHLIB. If the resulting file does not compile, an error message will be generated. No attempt is made by pomp to interpret this message. Typically, compilation errors are due to either invalid C syntax or undeclared variables.
State variables, parameters, observables, and covariates must be left undeclared within the snippet. State variables and parameters are declared via the statenames or paramnames arguments to pomp, respectively. Compiler errors that complain about undeclared state variables or parameters are usually due to failure to declare these in statenames or paramnames, as appropriate.
A C snippet can declare local variables. Be careful not to use names that match those of state variables, observables, or parameters. The latter must never be declared within a C snippet.
Names of observables are determined by their names in the data. They must be referred to in measurement model C snippets (rmeasure and dmeasure) by those names.
If the pomp object contains a table of covariates (see above), then the variables in the covariate table will be available, by their names, in the context within which the C snippet is executed.
Because the dot ‘.’ has syntactic meaning in C, R variables with names containing dots (‘.’) are replaced in the C codes by variable names in which all dots have been replaced by underscores (‘_’).
The header ‘R.h’, provided with R, will be included in the generated C file, making all of the R C API available for use in the C snippet. This makes a great many useful functions available, including all of R's statistical distribution functions.
The header ‘pomp.h’, provided with pomp, will also be included, making all of the pomp C API available for use in every C snippet. Do
```
file.show(system.file("include/pomp.h",package="pomp"))
```
to view this header file.
Snippets of C code passed to the globals argument of pomp will be included at the head of the generated C file. This can be used to declare global variables, define useful functions, and include arbitrary header files.

The Unobserved Markov State-Process Model

Specification of process-model codes rprocess and/or dprocess is facilitated by pomp's so-called plug-ins, which allow one to easily specify the most common kinds of process model. Discrete-time processes If the state process evolves in discrete time, specify rprocess using the discrete.time.sim plug-in. Specifically, provide

      rprocess = discrete.time.sim(step.fun = f, delta.t)

to pomp, where f is a C snippet or R function that takes simulates one step of the state process. The former is the preferred option, due to its much greater computational efficiency. The goal of such a C snippet is to replace the state variables with their new random values at the end of the time interval. Accordingly, each state variable should be over-written with its new value. In addition to the states, parameters, covariates (if any), and observables, the variables t and dt, containing respectively the time at the beginning of the step and the step's duration, will be defined in the context in which the C snippet is executed. See below under “General rules for C snippet writing” for more details. Examples are to be found in the tutorials on the package website. If f is given as an R function, it should have prototype

      f(x, t, params, delta.t, ...)

When f is called, x will be a named numeric vector containing the value of the state process at time t, params will be a named numeric vector containing parameters, and delta.t will be the time-step. It should return a named vector of the same length, and with the same set of names, as x, representing a draw from the distribution of the state process at time t+delta.t, conditional on its having value x at time t. Continuous-time processes If the state process evolves in continuous time, but you can use an Euler approximation, specify rprocess using the euler.sim plug-in. Furnish

      rprocess = euler.sim(step.fun = f, delta.t)

to pomp in this case. As before, f can be provided either as a C snippet or as an R function, the former resulting in much quicker computations. The form of f will be the same as above (in the discrete-time case). If you have a procedure that allows you, given the value of the state process at any time, to simulate it at an arbitrary time in the future, use the onestep.sim plug-in. To do so, furnish

      rprocess = onestep.sim(step.fun = f)

to pomp. Again, f can be provided either as a C snippet or as an R function, the former resulting in much quicker computations. The form of f should be as above (in the discrete-time or Euler cases). If you desire exact simulation of certain continuous-time Markov chains, an implementation of Gillespie's algorithm (Gillespie 1977) is available, via the gillespie.sim plug-in. In this case, furnish

      rprocess = gillespie.sim(rate.fun = f, v, d)

to pomp, where f gives the rates of the elementary events. Here, f must be an R function of the form

      f(j, x, t, params, ...)

When f is called, the integer j will be the number of the elementary event (corresponding to the columns of matrices v and d, see below), x will be a named numeric vector containing the value of the state process at time t and params is a named numeric vector containing parameters. f should return a single numerical value, representing the rate of that elementary event at that point in state space and time. Matrices v and d specify the continuous-time Markov process in terms of its elementary events. Each should have dimensions nvar x nevent, where nvar is the number of state variables and nevent is the number of elementary events. v describes the changes that occur in each elementary event: it will usually comprise the values 1, -1, and 0 according to whether a state variable is incremented, decremented, or unchanged in an elementary event. d is a binary matrix that describes the dependencies of elementary event rates on state variables: d[i,j] will have value 1 if event rate j must be updated as a result of a change in state variable i and 0 otherwise. A faster, but approximate, version of the Gillepie algorithm, the so-called “K-leap” method of Cai and Xu (2007), is implemented in the kleap.sim plug-in. To use it, supply

      rprocess = kleap.sim(rate.fun, e, v, d)

to pomp, where rate.fun, v, and d are as above, and e gives relative error tolerances for each of the state variables. e should have length equal to the number of state variables. e[i] corresponds to row i of the v and d matrices and we must have $0 <= e[i]="" <="1$." the="" leap="" size,="" $k$,="" is="" chosen="" so="" that="" $k="" size="" of="" time="" step="" simulator="" plug-ins="" discrete.time.sim, euler.sim, and onestep.sim all work by taking discrete time steps. They differ as to how this is done. Specifically,

onestep.sim takes a single step to go from any given time t1 to any later time t2 (t1 < t2). Thus, this plug-in is designed for use in situations where a closed-form solution to the process exists.
To go from t1 to t2, euler.sim takes n steps of equal size, where
```
	n = ceiling((t2-t1)/delta.t).
```
discrete.time.sim assumes that the process evolves in discrete time, where the interval between successive times is delta.t. Thus, to go from t1 to t2, discrete.time.sim takes n steps of size exactly delta.t, where
```
	n = floor((t2-t1)/delta.t).
```

Specifying dprocess If you have a procedure that allows you to compute the probability density of an arbitrary transition from state $x1$ at time $t1$ to state $x2$ at time $t2$, assuming that the state remains unchanged between $t1$ and $t2$, then you can use the onestep.dens plug-in. This is accomplished by furnishing

      dprocess = onestep.dens(dens.fun = f)

to pomp, where f is an R function with prototype

      f(x1, x2, t1, t2, params, ...)

When f is called, x1 and x2 will be named numeric vectors containing the values of the state process at times t1 and t2, respectively, and params will be a named numeric vector containing parameters. f should return the log likelihood of a transition from x1 at time t1 to x2 at time t2, assuming that no intervening transitions have occurred. To see examples, consult the tutorials on the package website.

The Measurement Model

The measurement model is the link between the data and the unobserved state process. It can be specified either by using one or both of the rprocess and dprocess arguments, or via the measurement.model argument. If measurement.model is given it overrides any specification via the rmeasure or dmeasure arguments, with a warning. The best way to specify the measurement model is by giving C snippets for rmeasure and dmeasure. In writing an rmeasure C snippet, bear in mind that:

The goal of such a snippet is to fill the observables with random values drawn from the measurement model distribution. Accordingly, each observable should be assigned a new value.
In addition to the states, parameters, covariates (if any), and observables, the variable t, containing the time of the observation, will be defined in the context in which the snippet is executed.

General rules for writing C snippets are provided below. The tutorials on the package website give examples as well. It is also possible, though far less efficient, to specify rmeasure using an R function. In this case, specify the measurement model simulator by furnishing

    rmeasure = f

to pomp, where f is an R function with prototype

    f(x, t, params, \dots)

It can also take any additional arguments if these are passed along with it in the call to pomp. When f is called,

x will be a named numeric vector of length nvar, the number of state variables.
t will be a scalar quantity, the time at which the measurement is made.
params will be a named numeric vector of length npar, the number of parameters.

f must return a named numeric vector of length nobs, the number of observable variables. In writing a dmeasure C snippet, observe that:

In addition to the states, parameters, covariates (if any), and observables, the variable t, containing the time of the observation, and the Boolean variable give_log will be defined in the context in which the snippet is executed.
The goal of such a snippet is to set the value of the lik variable to the likelihood of the data given the state. Alternatively, if give_log == 1, lik should be set to the log likelihood.

If dmeasure is to be provided instead as an R function, this is accomplished by supplying

    dmeasure = f

to pomp, where f has prototype

    f(y, x, t, params, log, \dots)

Again, it can take additional arguments that are passed with it in the call to pomp. When f is called,

y will be a named numeric vector of length nobs containing values of the observed variables;
x will be a named numeric vector of length nvar containing state variables;
params will be a named numeric vector of length npar containing parameters;
t will be a scalar, the corresponding observation time.

f must return a single numeric value, the probability density of y given x at time t. If log == TRUE, then f should return instead the log of the probability density. Note: it is a common error to fail to account for both log = TRUE and log = FALSE when writing the dmeasure C snippet or function. One can also specify both the rmeasure and dmeasure components at once via the measurement.model argument. It should be a formula or list of nobs formulae. These are parsed internally to generate rmeasure and dmeasure functions. Note: this is a convenience function, primarily designed to facilitate model exploration; it will typically be possible (and as a practical matter necessary) to accelerate measurement model computations by writing dmeasure and/or rmeasure using C snippets.

The Deterministic Skeleton

The skeleton is a dynamical system that expresses the central tendency of the unobserved Markov state process. As such, it is not uniquely defined, but can be both interesting in itself and useful in practice. In pomp, the skeleton is used by trajectory and traj.match. If the state process is a discrete-time stochastic process, then the skeleton is a discrete-time map. To specify it, provide

    skeleton = map(f, delta.t)

to pomp, where f implements the map and delta.t is the size of the timestep covered at one map iteration. If the state process is a continuous-time stochastic process, then the skeleton is a vectorfield (i.e., a system of ordinary differential equations). To specify it, supply

    skeleton = vectorfield(f)

to pomp, where f implements the vectorfield, i.e., the right-hand-size of the differential equations. In either case, f can be furnished either as a C snippet (the preferred choice), or an R function. In writing a skeleton C snippet, be aware that:

For each state variable, there is a corresponding component of the deterministic skeleton. The goal of such a snippet is to compute all the components.
When the skeleton is a map, the component corresponding to state variable x is named Dx and is the new value of x after one iteration of the map.
When the skeleton is a vectorfield, the component corresponding to state variable x is named Dx and is the value of $dx/dt$.
As with the other C snippets, all states, parameters and covariates, as well as the current time, t, will be defined in the context within which the snippet is executed.

The tutorials on the package website give some examples. If f is an R function, it must be of prototype

    f(x, t, params, \dots)

where, as usual,

x is a numeric vector (length nvar) containing the coordinates of a point in state space at which evaluation of the skeleton is desired.
t is a scalar value giving the time at which evaluation of the skeleton is desired.
params is a numeric vector (length npar) holding the parameters.

As with the other basic components, f may take additional arguments, provided these are passed along with it in the call to pomp. The function f must return a numeric vector of the same length as x, which contains the value of the map or vectorfield at the required point and time.

The State-Process Initializer

To fully specify the unobserved Markov state process, one must give its distribution at the zero-time (t0). By default, pomp assumes that this initial distribution is concentrated on a single point. In particular, any parameters in params, the names of which end in “.0”, are assumed to be initial values of states. When the state process is initialized, these are simply copied over as initial conditions. The names of the resulting state variables are obtained by dropping the “.0” suffix. One can override this default behavior by furnishing a value for the initializer argument of pomp. As usual, this can be provided either as a C snippet or as an R function. In the former case, bear in mind that:

The goal of a this snippet is the construction of a state vector, i.e., the setting of the dynamical states at time $\code{t0}$.
In addition to the parameters and covariates (if any), the variable t, containing the zero-time, will be defined in the context in which the snippet is executed.
NB: The statenames argument plays a particularly important role when the initializer is specified using a C snippet. In particular, every state variable must be named in statenames. Failure to follow this rule will result in undefined behavior.

If an R function is to be used, pass

    initializer = f

to pomp, where f is a function with prototype

    f(params, t0, \dots)

When f is called,

params will be a named numeric vector of parameters.
t0 will be the time at which initial conditions are desired.

As usual, f may take additional arguments, provided these are passed along with it in the call to pomp. f must return a named numeric vector of initial states. It is of course important that the names of the states match the expectations of the other basic components. Note that the state-process initializer can be either deterministic (the default) or stochastic. In the latter case, it samples from the distribution of the state process at the zero-time, t0.

Specifying a Prior

A prior distribution on parameters is specified by means of the rprior and/or dprior arguments to pomp. As with the other basic model components, it is preferable to specify these using C snippets. In writing a C snippet for the prior sampler (rprior), keep in mind that:

Within the context in which the snippet will be evaluated, only the parameters will be defined.
The goal of such a snippet is the replacement of parameters with values drawn from the prior distribution.
Hyperparameters can be included in the ordinary parameter list. Obviously, hyperparameters should not be replaced with random draws.

In writing a C snippet for the prior density function (dprior), observe that:

Within the context in which the snippet will be evaluated, only the parameters and give_log will be defined.
The goal of such a snippet is computation of the prior probability density, or the log of same, at a given point in parameter space. This scalar value should be returned in the variable lik. When give_log == 1, lik should contain the log of the prior probability density.
Hyperparameters can be included in the ordinary parameter list.

Alternatively, one can furnish R functions for one or both of these arguments. In this case, rprior must be a function of prototype

    f(params, \dots)

that makes a draw from the prior distribution given params and returns a named vector of the same length and with the same set of names, as params. The dprior function must be of prototype

    f(params, log = FALSE, \dots).

Its role is to evaluate the prior probability density (or log density if log == TRUE) and return that single scalar value.

Covariates

If the pomp object contains covariates (specified via the covar argument; see above), then interpolated values of the covariates will be available to each of the model components whenever it is called. In particular, variables with names as they appear in the covar data frame will be available to any C snippet. When a basic component is defined using an R function, that function will be called with an extra argument, covars, which will be a named numeric vector containing the interpolated values from the covariate table. An exception to this rule is the prior (rprior and dprior): covariate-dependent priors are not allowed. Nor are parameter transformations permitted to depend upon covariates.

Parameter Transformations

When parameter transformations are desired, they can be integrated into the pomp object via the toEstimationScale and fromEstimationScale arguments. As with the basic model components, these should ordinarily be specified using C snippets. When doing so, note that:

The parameter transformation mapping a parameter vector from the scale used by the model codes to another scale is specified using the toEstimationScale argument whilst the transformation mapping a parameter vector from the alternative scale to that on which the model is defined is specified with the fromEstimationScale argument.
The goal of these snippets is the computation of the values of the transformed parameters. The value of transformed parameter p should be assigned to variable Tp.
Time-, state-, and covariate-dependent transformations are not allowed. Therefore, neither the time, nor any state variables, nor any of the covariates will be available in the context within which a parameter transformation snippet is executed.

These transformations can also be specified using R functions with arguments params and .... In this case, toEstimationScale should transform parameters from the scale that the basic components use internally to the scale used in estimation. fromEstimationScale should be the inverse of toEstimationScale. Note that it is the user's responsibility to make sure that the transformations are mutually inverse. If obj is the constructed pomp object, and coef(obj) is non-empty, a simple check of this property is

    x <- coef(obj, transform = TRUE)
    obj1 <- obj
    coef(obj1, transform = TRUE) <- x
    identical(coef(obj), coef(obj1))
    identical(coef(obj1, transform=TRUE), x)

By default, both functions are the identity transformation.

Accumulator Variables

In formulating models, one sometimes wishes to define a state variable that will accumulate some quantity over the interval between successive observations. pomp provides a facility to make such features more convenient. Specifically, variables named in the pomp's zeronames argument will be set to zero immediately following each observation. See euler.sir and the tutorials on the package website for examples.

Viewing generated C code

It can be useful to view the C code generated by calling pomp with one or more C snippet arguments. You can set cdir and cfile to control where this code is written. Alternatively, set options(verbose=TRUE) before calling pomp. This will cause a message giving the name of the generated C file (in the session temporary directory) to be printed.

Warning

Some error checking is done by pomp, but complete error checking for arbitrary models is impossible. If the user-specified functions do not conform to the above specifications, then the results may be invalid. In particular, if both rmeasure and dmeasure are specified, the user should verify that these two functions correspond to the same probability distribution. If skeleton is specified, the user is responsible for verifying that it corresponds to a deterministic skeleton of the model.

References

A. A. King, D. Nguyen, and E. L. Ionides (2016) Statistical Inference for Partially Observed Markov Processes via the R Package pomp. Journal of Statistical Software 69(12): 1--43.

D. T. Gillespie (1977) Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 81:2340--2361. X. Cai and Z. Xu (2007) K-leap method for accelerating stochastic simulation of coupled chemical reactions. Journal of Chemical Physics 126:074102.

Examples

Run this code


## pomp encoding a stochastic Ricker model with a covariate:

pomp(data = data.frame(t = 1:100, y = NA),
     times = "t", t0 = 0,
     covar = data.frame(t=0:100,K=seq(from=50,to=200,length=101)),
     tcovar = "t",
     rprocess = discrete.time.sim(Csnippet("double e = rnorm(0,sigma);
                                            n = r*n*exp(1-n/K+e);"), delta.t = 1),
     skeleton = map(Csnippet("Dn = r*n*exp(1-n/K);"), delta.t = 1),
     rmeasure = Csnippet("y = rpois(n);"),
     dmeasure = Csnippet("lik = dpois(y,n,give_log);"),
     rprior = Csnippet("r = rgamma(1,1); sigma = rgamma(1,1);"),
     dprior = Csnippet("lik = dgamma(r,1,1,1) + dgamma(sigma,1,1,1);
                        if (!give_log) lik = exp(lik);"),
     initializer = Csnippet("n = n_0;"),
     toEstimationScale = Csnippet("Tr = log(r); Tsigma = log(sigma);"),
     fromEstimationScale = Csnippet("Tr = exp(r); Tsigma = exp(sigma);"),
     paramnames = c("n_0", "r", "sigma"),
     statenames = "n") -> rick

## fill it with simulated data:

rick <- simulate(rick, params = c(r=17, sigma = 0.1, n_0 = 50))
plot(rick)     

## Not run: 
#     pompExample()
#     demos(package="pomp")
# ## End(Not run)

Run the code above in your browser using DataLab